First, we need to create a GitHub account. To create a github account, visit Github. Register with your BSU email address and create an account for free. Good news if you already have a GitHub account, just login with your details!
Download and install Git on your local computer using this link. Choose your operating system and follow the directions given on the website.
Throughout this semester, we have discussed making our science reproducible and accessible. We started by learning the concept of reproducible science and why it is so important. We also learned that there is a reproducibility crisis, what might be causing it, and how to address it. To address writing reproducible code, we learned about R markdown files and how to format them in a clear, reproducible way. We next learned how to make bibliographies, appendices, tables, figures, citations, and references in R markdown and how to use settings to best display them. We learned about creating and using functions and lists in R and writing code in a clear, reproducible, and defensive way.
Then, we discussed the scientific method and how to mitigate the threats to each step of the scientific process. One of the mitigation strategies is using the TOP (Transparency and Openness Promotion) Guidelines, another is the OSF (Open Science Framework). Other mitigation strategies include the FAIR and CARE principles. FAIR principles focus on making data accessible to everyone, whereas CARE principles apply specifically to data involving Indigenous People and emphasize considerations of who benefits from the data, who can collect the data and the responsibilities and ethics surrounding the data. They are technically in opposition to each other but are equally important to consider when conducting research and before sharing data. Connected to this was the concept of a Creative Common license and which would be best to use for your own research.
Finally, we learned about data management, that it is best to create a data management plan before starting research, and how to manage your data in a way that means it will continue to usable by you, your colleagues, and the general public, as long as such accessibility has been checked by the CARE principles.
One of the final pieces of making your science reproducible is making sure your data and code are accessible far into the future, and importantly considering the FAIR and CARE principles. During this tutorial, we will introduce Git and Github, which will help make your science more reproducible.
Git is a version control system, which keeps track of the changes made to the files stored within it and allows us to return to previous versions. GitHub is the cloud hosting service built on top of Git. It can store data for you remotely, solving the issue of data storage. However, it was not built specifically for researchers, but instead for computer programmers. So at first glance, GitHub can be challenging to understand and use. With some time, the many helpful features of GitHub can become more clear and be used to help us conduct more reproducible research.
One of these features is data storage and access. When you create a project, which GitHub calls a repository, you can store data and code along with a README.md file that can be used to make your file structure even more clear. It can be particularly useful when several collaborators are working on the same project. Github allows files to be shared and edited, and tracks who edited which files and how they edit the files. Researchers unconnected to the project can also suggest changes to files, if the repository is public, which could provide further checks that help make science more reproducible. Github will store versions of your file indefinitely and for free, and will connect to RStudio Projects, which we will discuss today. We created this tutorial with information from (Gandrud, 2020).
On Day 1, we will learn Key Terms for work in Git and GitHub, make a new repository in GitHub and clone it into a new RStudio Git version control project, and work with partners to practice working with GitHub.
On Day 2, we will look at GitHub repositories that are alive and nice examples of how your GitHub repository might look in the future! We will practice importing data from one of these into RStudio, and uploading data into your own GitHub repository.
Make sure your settings are up to date by going to Tools –> Global Options –> Terminal –> under the General Tab –> New Terminals Open with: GitBash –> Apply.
As we work today and for your future Git adventures, here are a few useful terms and their definitions. You can use the search bar embedded in the table to search terms or any words in the definitions for further clarification and assistance. Pay specific attention to the definitions for Commit, Commit message, Pull, Push, and Repository.
Now that you have created a GitHub account, it is time to make your first repository! To do so,
For the Configuration options:
In RStudio, just like creating a new project at the beginning of this class (or anytime you want to start a new project), you can make a new project that can be linked to a remote repository in GitHub.
Figure 3.1: Cloning Repository from GitHub into RStudio
If you are the owner of the repository, when you and your collaborators make changes, you can see the changes if you have push notifications turned on. This is good for tracking and knowing what your collaborators are working on. To turn on push notifications:
Unfortunately only the admin or owner of the repository can get notifications when changes are made. So far we have not found a way to make collaborators receive these notifications. We tried using the “Watch” button in the upper right for the collaborator to receive this messages, but that did not seem to work. This remains to be explored as it can be a useful tool for remote communication.
By now, you should have created a repository of your own and linked it with a matching project in RStudio. The next piece of working in github is to practice working with a collaborator.
In pairs, you will:
Choose one of the pair to be the admin and one to be the collaborator.
The admin will go to the settings of their repository, then click on Collaborators in the upper left area of the settings. You’ll be prompted to login, once you do, you’ll get to the collaborators screen.
Once each of you have the repository open in RStudio, one will go first, writing a sentence in the README.md, saving the your local file, committing the change (Figure 3.2), pushing the change to the remote GitHub repository file, then the other partner pulls that change with the down pull arrow and adds their own sentence.
Do a repeat of this exercise until you have 3 sentences each (or 1 each for time).
Figure 3.2: Interface when pushing and committing changes
In Day 1, we learned Key Terms for work in Git and GitHub, made a new repository in GitHub and cloned it into a new RStudio Git version control project, and worked with partners to practice making changes, committing them, pushing changes to the remote GitHub repository, and pulling changes from the remote GitHub repository to your local RStudio project files.
On Day 2, we will look at GitHub repositories that are alive and nice examples of how your GitHub repository might look in the future! We will practice importing data from one of these into RStudio, and uploading data into your own GitHub repository.
As you read more papers, interact with collaborators, and become a scientist doing extremely reproducible science, you will likely need to bring data from a public GitHub repository into your own computer and into R.
One example of a public repository is Sven’s repository for this class Reproducible 603. Following the link, you can see his repository, with all the folders and files inside displayed in a list. If you scroll to the bottom, you can see the README with information about this repository/class.
Another example of a public repository is one that was made as part of a paper submission. Fiona read a cool microbial ecology paper that tracked the ecological and evolutionary responses of Curtobacterium over an elevation gradient in Southern California (Chase et al., 2021). In the paper, there is a section that says Data Availability. Following the link to their github, we can see a similar set of folders and files, as well as a README.
Download the data file by clicking on the Download raw file button. It will now be on your computer.
Practice uploading data to your own GitHub repository by returning to your repository, clicking the add file button next to the green code button, then Upload files in the dropdown.
You can drag files into the box or choose your files. Make sure to commit your changes in the section underneath the box.
To import data into RStudio, go to your repository in GitHub.
Find the data file we just uploaded and go to the file by clicking on it in the list of files in your repository.
Click on the Raw button in the upper left.
It should bring you to a page with just the file contents and the url at the top is what you need to copy.
Moving back to RStudio, in a new R script, run these lines of code:
# Paste URL into url object
#url <-"PASTE HERE"
# Download data
#YOUR_data <- rio::import(url, format = "csv")
You should now see your data file in the environment, ready to be used!
As a final send-off to all of you (and any future users of this tutorial), here is a list of resources we used to build this tutorial.
For the latest version of Gandrud’s textbook, visit this link which will allow you to search Boise State University’s library and download an ebook: 3rd edition reproducuble Research with R and Rstudio
Citations of all R packages used to generate this report. [1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. https://github.com/rstudio/rmarkdown.
[2] S. M. Bache and H. Wickham. magrittr: A Forward-Pipe Operator for R. R package version 2.0.3. 2022. https://magrittr.tidyverse.org.
[3] J. Becker, C. Chan, D. Schoch, et al. rio: A Swiss-Army Knife for Data I/O. R package version 1.2.4. 2025. https://gesistsa.github.io/rio/.
[4] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.
[5] C. Chan, T. J. Leeper, J. Becker, et al. rio: A Swiss-army knife for data file I/O. 2023. https://cran.r-project.org/package=rio.
[6] J. Cheng, C. Sievert, B. Schloerke, et al. htmltools: Tools for HTML. R package version 0.5.8.1. 2024. https://github.com/rstudio/htmltools.
[7] R. Francois and D. Hernangómez. bibtex: Bibtex Parser. R package version 0.5.1. 2023. https://github.com/ropensci/bibtex.
[8] G. Grolemund and H. Wickham. “Dates and Times Made Easy with lubridate”. In: Journal of Statistical Software 40.3 (2011), pp. 1-25. https://www.jstatsoft.org/v40/i03/.
[9] K. Müller and H. Wickham. tibble: Simple Data Frames. R package version 3.2.1. 2023. https://tibble.tidyverse.org/.
[10] Y. Qiu. prettydoc: Creating Pretty Documents from R Markdown. R package version 0.4.1. 2021. https://github.com/yixuan/prettydoc.
[11] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. https://www.R-project.org/.
[12] K. Ren and K. Russell. formattable: Create Formattable Data Structures. R package version 0.2.1. 2021. https://renkun-ken.github.io/formattable/.
[13] V. Spinu, G. Grolemund, and H. Wickham. lubridate: Make Dealing with Dates a Little Easier. R package version 1.9.3. 2023. https://lubridate.tidyverse.org.
[14] H. Wickham. forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.0. 2023. https://forcats.tidyverse.org/.
[15] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. https://ggplot2.tidyverse.org.
[16] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.5.1. 2023. https://stringr.tidyverse.org.
[17] H. Wickham. tidyverse: Easily Install and Load the Tidyverse. R package version 2.0.0. 2023. https://tidyverse.tidyverse.org.
[18] H. Wickham, M. Averick, J. Bryan, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.
[19] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.1.0. 2024. https://usethis.r-lib.org.
[20] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.5.2. 2025. https://ggplot2.tidyverse.org.
[21] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. https://dplyr.tidyverse.org.
[22] H. Wickham and L. Henry. purrr: Functional Programming Tools. R package version 1.0.2. 2023. https://purrr.tidyverse.org/.
[23] H. Wickham, J. Hester, and J. Bryan. readr: Read Rectangular Text Data. R package version 2.1.5. 2024. https://readr.tidyverse.org.
[24] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.5. 2022. https://devtools.r-lib.org/.
[25] H. Wickham, D. Vaughan, and M. Girlich. tidyr: Tidy Messy Data. R package version 1.3.1. 2024. https://tidyr.tidyverse.org.
[26] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman and Hall/CRC, 2016. ISBN: 978-1138700109. https://bookdown.org/yihui/bookdown.
[27] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.44. 2025. https://github.com/rstudio/bookdown.
[28] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.
[29] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.
[30] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. https://yihui.org/knitr/.
[31] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.
[32] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.
[33] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.4.0. 2024. http://haozhu233.github.io/kableExtra/.
Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rio_1.2.4 lubridate_1.9.3 forcats_1.0.0
## [4] stringr_1.5.1 purrr_1.0.2 readr_2.1.5
## [7] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.2
## [10] tidyverse_2.0.0 devtools_2.4.5 usethis_3.1.0
## [13] bibtex_0.5.1 knitcitations_1.0.12 htmltools_0.5.8.1
## [16] prettydoc_0.4.1 magrittr_2.0.3 dplyr_1.1.4
## [19] kableExtra_1.4.0 formattable_0.2.1 bookdown_0.44
## [22] rmarkdown_2.29 knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.53 bslib_0.8.0 remotes_2.5.0
## [5] htmlwidgets_1.6.4 tzdb_0.4.0 vctrs_0.6.5 tools_4.4.1
## [9] generics_0.1.3 fansi_1.0.6 RefManageR_1.4.0 pkgconfig_2.0.3
## [13] lifecycle_1.0.4 compiler_4.4.1 textshaping_0.4.0 munsell_0.5.1
## [17] codetools_0.2-20 httpuv_1.6.15 sass_0.4.9 yaml_2.3.10
## [21] urlchecker_1.0.1 pillar_1.9.0 later_1.4.1 jquerylib_0.1.4
## [25] ellipsis_0.3.2 cachem_1.1.0 sessioninfo_1.2.3 mime_0.12
## [29] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.4 grid_4.4.1
## [33] fastmap_1.2.0 colorspace_2.1-1 cli_3.6.3 pkgbuild_1.4.7
## [37] utf8_1.2.4 withr_3.0.2 scales_1.3.0 promises_1.3.2
## [41] backports_1.5.0 timechange_0.3.0 httr_1.4.7 hms_1.1.3
## [45] memoise_2.0.1 shiny_1.10.0 evaluate_1.0.3 miniUI_0.1.1.1
## [49] viridisLite_0.4.2 profvis_0.4.0 rlang_1.1.4 Rcpp_1.0.13
## [53] xtable_1.8-4 glue_1.7.0 xml2_1.3.6 pkgload_1.4.0
## [57] svglite_2.2.1 rstudioapi_0.16.0 jsonlite_1.8.8 R6_2.5.1
## [61] plyr_1.8.9 systemfonts_1.3.1 fs_1.6.4